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Abstract — We propose a new fast algorithm for solving one of 
the standard formulations of image restoration and reconstruc- 
tion which consists of an unconstrained optimization problem 
where the objective includes an £2 data-fidelity term and a non- 
smooth regularizer. This formulation allows both wavelet-based 
(with orthogonal or frame-based representations) regularization 
or total-variation regularization. Our approach is based on a 
variable splitting to obtain an equivalent constrained optimiza- 
tion formulation, which is then addressed with an augmented 
Lagrangian method. The proposed algorithm is an instance of 
the so-called alternating direction method of multipliers, for which 
convergence has been proved. Experiments on a set of image 
restoration and reconstruction benchmark problems show that 
the proposed algorithm is faster than the current state of the art 
methods. 

I. Introduction 

A. Problem Formulation 

Image restoration/reconstruction is one of the earliest and 
most classical linear inverse problems in imaging, dating back 
to the 1960's [1]. In this class of problems, a noisy indirect 
observation y, of an original image x, is modeled as 

y = Bx + n, 

where B is the matrix representation of the direct operator 
and n is noise. As is common, we are adopting the vector 
notation for images, where the pixels on an M x N image are 
stacked into a an (TVM)-vector in, e.g., lexicographic order. 
In the sequel, we denote by n the number of elements of x, 
thus x 6 W l , while y 6 E m (to and n may or may not be 
equal). 

In the particular case of image deblurring/deconvolution, B 
is the matrix representation of a convolution operator; if this 
convolution is periodic, B is then a (block) circulant matrix. 
This type of observation model describes well several physical 
mechanisms, such as relative motion between the camera and 
the subject (motion blur), bad focusing (defocusing blur), or 
a number of other mechanisms which are well modeled by a 
convolution. 

In more general image reconstruction problems, B rep- 
resents some linear direct operator, such as a set of tomo- 
graphic projections (Radon transform), a partially observed 
(e.g., Fourier) transform, or the loss of part of the image pixels. 

It is well known that the problem of estimating x from 
y is ill-posed, thus this inverse problem can only be solved 
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satisfactorily by adopting some sort of regularization (or prior 
information, in Bayesian inference terms). One of the stan- 
dard formulations for wavelet-based regularization of image 
restoration/reconstruction problems is built as follows. Let the 
unknown image x be represented as a linear combination of 
the elements of some frame, i.e., x = W/3, where (3 £ M. d , 
and the columns of the n x d matrix W are the elements 
of a wavelet 1 frame (an orthogonal basis or a redundant 
dictionary). Then, the coefficients of this representation are 
estimated from the noisy image, under one of the well-known 
sparsity inducing regularizers, such as the l\ norm (see [15], 
[18], [21], [22], [23], and further references therein). Formally, 
this leads to the following optimization problem: 

3 = argimni||BW/3-y||!+T<K/3) (D 
P 1 

where </> : R d — > M, usually called the regularizer or 
regularization function is usually nonsmooth, or maybe even 
nonconvex, and r > is the regularization parameter. This 
formulation is referred to as the synthesis approach [19], since 
it is based on a synthesis equation where x is synthesized from 
its representation coefficients (x = W/3) with are the object 
of the estimation criterion. Of course, the final image estimate 
is computed as x = W/3. 

An alternative formulation applies a regularizer directly to 
the unknown image, leading to criteria of the form 

x = argmin-||Bx-y||2+T0(x) (2) 
x 2 

where </> : K™ — > R is the regularizer. This type of criteria 
are usually called analysis approaches, since they're based 
on a regularizer that analyzes the image itself, </>(x), rather 
than the coefficients of a representation thereof. Arguably, the 
best known and most often used regularizer used in analysis 
approaches to image restoration is the total variation (TV) 
norm [40], [11]. Wavelet-based analysis approaches are also 
possible [19], but will not be considered in this paper. 

Finally, it should be mentioned that problems (1) and (2) 
can be seen as the Lagrangians of associated constrained 
optimization problems: (1) is the Lagrangian of the constrained 
problem 

min 0(/3) subject to ||BW/3 - y||| < e, (3) 
P* 

while (2) is the Lagrangian of 

min</>(x) subject to ||Bx — y||| < e. (4) 

X 

'We will use the generic term "wavelet" to mean any wavelet-like multi- 
scale representation, such as "curvelets", "beamlets", or "ridgelets". 
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Specifically, a solution of (3) (for any e such that this problem 
is feasible) is either the null vector, or else is a minimizer 
of (1), for some r > (see [39, Theorem 27.4]). A similar 
relationship exists between problems (2) and (4). 

B. Previous Algorithms 

For any problem of non-trivial dimension, matrices BW, 
B, and W cannot be stored explicitly, and it is costly, even 
impractical, to access portions (lines, columns, blocks) of 
them. On the other hand, matrix-vector products involving 
B or W (or their conjugate transposes B H and W H ) can 
be done quite efficiently. For example, if the columns of 
W contain a wavelet basis or a tight wavelet frame, any 
multiplication of the form Wv or W ff v can be performed 
by a fast wavelet transform algorithm [34]. Similarly, if B 
represents a convolution, products of the form Bv or B H v 
can be performed with the help of the fast Fourier transform 
(FFT) algorithm. These facts have stimulated the development 
of special purpose methods, in which the only operations 
involving B or W (or their conjugate transposes) are matrix- 
vector products. 

To present a unified view of algorithms for handling (1) and 
(2), we write them in a common form 

mini||Ax-y||2 + T^(x) (5) 
x 2 

where A = BW, in the case of (1), while A = B, for (2). 

Arguably, the standard algorithm for solving problems of the 
form (5) is the so-called iterative shrinkage/thresholding (1ST) 
algorithm. 1ST can be derived as an expectation-maximization 
(EM) algorithm [22], as a majorization-minimization (MM, 
[29]) method [15], [23], or as a forward-backward splitting 
technique [13], [27]. A key ingredient of 1ST algorithms is 
the so-called shrinkage/thresholding function, also known as 
the Moreau proximal mapping [13] or the denoising function, 
associated to the regularizer <fi, which provides the solution 
of the corresponding pure denoising problem. Formally, this 
function is denoted as : R m — > M m and defined as 

*r0(y) = argrnin^llx- y||| +t0(x). (6) 

Notice that if <f> is proper and convex, the function being 
minimized is proper and strictly convex, thus the minimizer 
exists and is unique making the function well defined [13]. 

For some choices of <fi, the corresponding denoising func- 
tions have well known closed forms. For example, 
choosing <j>(x) = ||x||i = X^l^l' tne ^i norm > leads to 
*r£i (y) = soft(y, r), where soft(-, r) denotes the component- 
wise application of the function y sign(y) max{|j/| — r, 0}. 

If </>(x) = ||x||o = \{i ■ Xi 7^ 0}|, usually referred to as 
the £q "norm" (although it is not a norm), despite the fact 
that this regularizer is not convex, the corresponding shrink- 
age/thresholding function also has a simple close form: the 
so-called hard-threshold function, *& T e (y) = hard(y, \/2r), 
where hard(-,a) denotes the component-wise application of 
the function y yl\ y \ >a . A comprehensive coverage of 
Moreau proximal maps can be found in [13]. 



Each 1ST iteration for solving (5) is given by 

x fe+ i = * T0 (xt -^ A " ( Ax fe - y)) . (7) 

where I/7 is a step size. Notice that A H (Ax^ — y) is the 
gradient of the data-fidelity term (1/2)|| Ax — y|||, computed 
at Xfc; thus, each 1ST iteration takes a step of length 1/7 
in the direction of the negative gradient of the data-fidelity 
term, followed by the application of the shrinkage/thresholding 
function associated with the regularizer <j>. 

It has been shown that if 7 > ||A||2/2 and <j> is convex, the 
algorithm converges to a solution of (1) [13]. However, it is 
known that 1ST may be quite slow, specially when r is very 
small and/or the matrix A is very ill-conditioned [4], [5], [21], 
[27]. This observation has stimulated work on faster variants 
of 1ST, which we will briefly review in the next paragraphs. 

In the two-step 1ST (TwIST) algorithm [5], each iterate 
depends on the two previous iterates, rather than only on the 
previous one (as in 1ST). This algorithm may be seen as a 
non-linear version of the so-called two-step methods for linear 
problems [2]. TwIST was shown to be considerably faster 
than 1ST on a variety of wavelet-based and TV-based image 
restoration problems; the speed gains can reach up two orders 
of magnitude in typical benchmark problems. 

Another two-step variant of 1ST, named fast 1ST algorithm 
(FISTA), was recently proposed and also shown to clearly 
outperform 1ST in terms of speed [4]. FISTA is a non-smooth 
variant of Nesterov's optimal gradient-based algorithm for 
smooth convex problems [35], [36]. 

A strategy recently proposed to obtain faster variants of 
1ST consists in relaxing the condition 7 > 7 min = HAH2/2. In 
the SpaRSA (standing for sparse reconstruction by separable 
approximation) framework [44], [45], a different j t is used in 
each iteration (which may be smaller than 7 min , meaning larger 
step sizes). It was shown experimentally that SpaRSA clearly 
outperforms standard 1ST. A convergence result for SpaRSA 
was also given in [45]. 

Finally, when the slowness is caused by the use of a small 
value of the regularization parameter, continuation schemes 
have been found quite effective in speeding up the algorithm. 
The key observation is that 1ST algorithm benefits significantly 
from warm-starting, i.e., from being initialized near a mini- 
mum of the objective function. This suggests that we can use 
the solution of (5), for a given value of r, to initialize 1ST 
in solving the same problem for a nearby value of t. This 
warm-starting property underlies continuation schemes [24], 
[27], [45]. The idea is to use 1ST to solve (1) for a larger value 
of t (which is usually fast), then decrease r in steps toward its 
desired value, running 1ST with warm-start for each successive 
value of r. 

C. Proposed Approach 

The approach proposed in this paper is based on the 
principle of variable splitting, which goes back at least to 
Courant in the 40's [14], [43]. Since the objective function 
(5) to be minimized is the sum of two functions, the idea is 
to split the variable x into a pair of variables, say x and v, 
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each to serve as the argument of each of the two functions, 
and then minimize the sum of the two functions under the 
constraint that the two variables have to be equal, so that the 
problems are equivalent. Although variable splitting is also the 
rationale behind the recently proposed split-Bregman method 
[25], in this paper, we exploit a different type of splitting to 
attack problem (5). Below we will explain this difference in 
detail. 

The obtained constrained optimization problem is then dealt 
with using an augmented Lagrangian (AL) scheme [37], which 
is known to be equivalent to the Bregman iterative methods re- 
cently proposed to handle imaging inverse problems (see [46] 
and references therein). We prefer the AL perspective, rather 
than the Bregman iterative view, as it is a standard and more 
elementary optimization tool (covered in most textbooks on 
optimization). In particular, we solve the constrained problem 
resulting from the variable splitting using an algorithm known 
as alternating direction method of multipliers (ADMM) [17]. 

The application of ADMM to our particular problem in- 
volves solving a linear system with the size of the unknown 
image (in the case of problem (2)) or with the size of its 
representation (in the case of problem (1)). Although this 
seems like an unsurmountable obstacle, we show that it is 
not the case. In many problems of the form (2), such as 
deconvolution, recovery of missing samples, or reconstruction 
from partial Fourier observations, this system can be solved 
very quickly in closed form (with 0(n) or 0(n log n) cost). 
For problems of the form (1), we show how exploiting the 
fact that W is a tight Parseval frame, this system can still be 
solved efficiently (typically with 0(n log n) cost. 

We report results of a comprehensive set of experiments, on 
a set of benchmark problems, including image deconvolution, 
recovery of missing pixels, and reconstruction from partial 
Fourier transform, using both frame-based and TV-based reg- 
ularization. In all the experiments, the resulting algorithm is 
consistently and considerably faster than the previous state of 
the art methods FISTA [4], TwIST [5], and SpaRSA [45]. 

The speed of the proposed algorithm, which we term 
SALSA {split augmented Lagrangian shrinkage algorithm), 
comes from the fact that it uses (a regularized version of) the 
Hessian of the data fidelity term of (5), that is, A H A, while 
the above mentioned algorithms essentially only use gradient 
information. 

D. Organization of the Paper 

Section II describes the basic ingredients of SALSA: vari- 
able splitting, augmented Lagrangians, and ADMM. In Section 
III, we show how these ingredients are combined to obtain the 
proposed SALSA. Section IV reports experimental results, and 
Section V ends the paper with a few remarks and pointers to 
future work. 

II. Basic Ingredients 

A. Variable Splitting 

Consider an unconstrained optimization problem in which 
the objective function is the sum of two functions, one of 



which is written as the composition of two functions, 

min/ 1 (u)+/ 2 (. g (u)), (8) 

where g : M™ — > M. d . Variable splitting is a very simple 
procedure that consists in creating a new variable, say v, 
to serve as the argument of fa, under the constraint that 
g(u) = v. This leads to the constrained problem 

min /i(u) + / 2 (v) 

uGR™,vGR d (9) 

subject to g(u) = v, 

which is clearly equivalent to unconstrained problem (8): 
in the feasible set {(u, v) : g(u) = v}, the objective 
function in (9) coincides with that in (8). The rationale behind 
variable splitting methods is that it may be easier to solve the 
constrained problem (9) than it is to solve its unconstrained 
counterpart (8). 

The splitting idea has been recently used in several image 
processing applications. A variable splitting method was used 
in [43] to obtain a fast algorithm for TV-based image restora- 
tion. Variable splitting was also used in [6] to handle problems 
involving compound regularizes; i.e., where instead of a 
single regularizer t</>(x) in (5), one has a linear combination 
of two (or more) regularizes ti</>i(x) + T202(x). In [6] and 
[43], the constrained problem (9) is attacked by a quadratic 
penalty approach, i.e., by solving 

min A(u) + / 2 (v) + | \\g(u) - v|||, (10) 

uGR",vSR d 2 

by alternating minimization with respect to u and v, while 
slowly taking a to very large values (a continuation process), 
to force the solution of (10) to approach that of (9), which in 
turn is equivalent to (8). The rationale behind these methods 
is that each step of this alternating minimization may be 
much easier than the original unconstrained problem (8). The 
drawback is that as a becomes very large, the intermediate 
minimization problems become increasingly ill-conditioned, 
thus causing numerical problems (see [37], Chapter 17). 

A similar variable splitting approach underlies the recently 
proposed split-Bregman methods [25]; however, instead of 
using a quadratic penalty technique, those methods attack 
the constrained problem directly using a Bregman iterative 
algorithm [46]. It has been shown that, when g is a linear 
function, i.e., g(u) = Gu, the Bregman iterative algorithm is 
equivalent to the augmented Lagrangian method [46], which 
is briefly reviewed in the following subsection. 

B. Augmented Lagrangian 

Consider the constrained optimization problem 
min E(z) 

zGR" (11) 

s.t. Az b =0, 

where beR p and A e R px ", there are p linear equality 
constraints. The so-called augmented Lagrangian function for 
this problem is defined as 

C A (z, X, - E(z) + A T (b - Az) + | 1| Az - b|||, (12) 
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where A e R p is a vector of Lagrange multipliers and p, > 
is called the penalty parameter [37]. 

The so-called augmented Lagrangian method (ALM) [37], 
also known as the method of multipliers (MM) [28], [38], 
consists in minimizing £a(z, A, fi) with respect to z, keeping 
A fixed, then updating A, and repeating these two steps 
until some convergence criterion is satisfied. Formally, the 
ALM/MM works as follows: 

Algorithm ALM/MM 

1. Set k = 0, choose \x > 0, z , and Ao. 

2. repeat 

3. z k+1 £ argrnin z £ A (z, X k , p) 

4. Afc+i = A fc + /i(b - Az fe+ i) 

5. k^k + l 

6. until stopping criterion is satisfied. 

It is also possible (and even recommended) to update the 
value of (i in each iteration [37], [3, Chap. 9]. However, unlike 
in the quadratic penalty approach, the ALM/MM does not 
require /i to be taken to infinity to guarantee convergence to 
the solution of the constrained problem (11). 

Notice that (after a straightforward complete-the-squares 
procedure) the terms added to E(z) in the definition of the 
augmented Lagrangian £a(z, Afc,/i) in (12) can be written 
as a single quadratic term (plus a constant independent of z, 
thus irrelevant for the ALM/MM), leading to the following 
alternative form of the algorithm (which makes clear its 
equivalence with the Bregman iterative method [46]): 

Algorithm ALM/MM ( version II) 



1. 
2. 
3. 
4. 
5. 
6. 



Set k = 0, choose \i > and d . 
repeat 

z fc+ i £ argmin z E(z) + f ||Az - 
dfc+i = dfe + (b Az fc+1 ) 
k^k + l 
until stopping criterion is satisfied. 



Ifc|l2 



It has been shown that, with adequate initializations, the 
ALM/MM generates the same sequence as a proximal point 
algorithm applied to the Lagrange dual of problem (11) [30]. 
Moreover, the sequence {dfe} converges to a solution of this 
dual problem and all cluster points of the sequence {zfc} are 
solutions of the (primal) problem (11) [30]. 

C. ALM/MM for Variable Splitting 

We now show how the ALM/MM can be used to address 
problem (9), in the particular case where g(u) — Gu, i.e., 



min /i(u) + / 2 (v) 

ueR", veR d 



(13) 



subject to G u = v, 

where G e M dxrl . Problem (13) can be written in the form 
(11) using the following definitions: 



u 

V 



0. 



G II 



and 



S(z) = / 1 (u) + / 2 (v). 



(14) 
(15) 



With these definitions in place, Steps 3 and 4 of the ALM/MM 
(version II) can be written as follows: 

(u fe+ i,Vfe+i) £ argmin /i(u) + / 2 (v) + 



|Gu-v-d fc ||2 (16) 



ife+i 



— dfe + Gu fe+1 - v fc+1 



(17) 



The minimization problem (16) is not trivial since, in 
general, it involves non-separable quadratic and possibly non- 
smooth terms. A natural to address (16) is to use a non- 
linear block-Gauss-Seidel (NLBGS) technique, in which (16) 
is solved by alternatingly minimizing it with respect to u and 
v, while keeping the other variable fixed. Of course this raises 
several questions: for a given dfe, how much computational 
effort should be spent in approximating the solution of (16)? 
Does this NLBGS procedure converge? Experimental evidence 
in [25] suggests that an efficient algorithm is obtained by 
running just one NLBGS step. It turns out that the resulting 
algorithm is the so-called alternating direction method of 
multipliers (ADMM) [17], which works as follows: 



Algorithm ADMM 

1. Set k = 0, choose [i > 0, v , and d . 

2. repeat 

3. Ufe + i £ argmin u /i(u) + § ||Gu- v fe — d fc ||| 

4. v fe+1 £ argmin v / 2 (v) + §||Gu fe +i v d fe ||| 
5- dfe +1 = dfe + Gu fc+1 — Vfe +1 

6. k^k + 1 

7. until stopping criterion is satisfied. 

For later reference, we now recall the theorem by Eckstein 
and Bertsekas, in which convergence of (a generalized version 
of) ADMM is shown. This theorem applies to problems of the 
form (8) with g(u) = Gu, i.e., 



min/i(u) + / 2 (Gu), 



(18) 



of which (13) is the constrained optimization reformulation. 

Theorem 1 (Eckstein-Bertsekas, [17]): Consider problem 
(18), where f\ and / 2 are closed, proper convex functions, 



and G £ 



vdxn 



has full column rank. Consider arbitrary 



(i > and v ,d £ R d . Let {t] k > 0, k = 0,1,...} and 
{v k > 0, k = 0, 1, ...} be two sequences such that 

oo oo 

^ r/k < oo and ^ v k < oo. 



fc=0 



fc=0 



Consider three sequences {ufe £ M. n , k = 0,1,...}, {vfe 
R d , k = 0,1,...}, and {dfe £ R d , k = 0,1, ...} that satisfy 



dfe+i 



> 



> 



Ufc+i - argmin /i(u) + ^||Gu-Vfe-d fc ||2 
u 2 



Vfe+i - argmin / 2 (v) + -||Gu fe+ i-v- 
v 2 

dfe + Gufe + i - Vfe+i. 



dfe|| 



Then, if (18) has a solution, the sequence {ufe} converges, 
Ufe — > u*, where u* is a solution of (18). If (18) does not have 
a solution, then at least one of the sequences {v fc } or {d k } 
diverges. 
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Notice that the ADMM algorithm defined above generates 
sequences {life}, {v,t}, and {d^} which satisfy the conditions 
in Theorem 1 in a strict sense (i.e., with rjk = /Ltfc = 0). One 
of the important consequences of this theorem is that it shows 
that it is not necessary to exactly solve the minimizations in 
lines 3 and 4 of ADMM; as long as sequence of errors is 
absolutely summable, convergence is not compromised. 

The proof of Theorem 1 is based on the equivalence 
between ADMM and the so-called Douglas-Rachford splitting 
method (DRSM) applied to the dual of problem (18). The 
DRSM was recently used for image recovery problems in 
[12]. For recent and comprehensive reviews of ALM/MM, 
ADMM, DRSM, and their relationship with Bregman and 
split-Bregman methods, see [26], [42]. 

III. Proposed Method 

A. Constrained Optimization Formulation of Image Recovery 

We now return to the unconstrained optimization formu- 
lation of regularized image recovery, as defined in (5). This 
problem can be written in the form (18), with 

/i(x) = ^||Ax-y||i (19) 
/ 2 (x) = r#x) (20) 
G = I. (21) 

The constrained optimization formulation is thus 

min ill Ax — yll, + t6(v) 
x.vGR" 2,1 (22) 

subject to x = v. 

At this point, we are in a position to clearly explain the 
difference between this formulation and the splitting exploited 
in split-Bregman methods (SBM) for image recovery [25]. 
In those methods, the focus of attention is a non-separable 
regularizer that can be written as <p(x) = ip(Gx), as is 
the case of the TV norm. The variable splitting used in 
SBM addresses this non-separability by defining the following 
constrained optimization formulation: 

min i||Ax — yll 2 . + ™(v) 
x,veR" 211 J " 2 ^ y ' (23) 

subject to D x = v. 

In contrast, we assume that the Moreau proximal mapping 
associated to the regularizer <j>, i.e., the function \£ r ^( - ) 
defined in (6), can be computed efficiently. The goal of our 
splitting is not to address the difficulty raised by a non- 
separable and non-quadratic regularizer, but to exploit second 
order (Hessian) information of the function f\, as will be 
shown below. 

B. Algorithm and Its Convergence 

Inserting the definitions given in (19)— (21) in the ADMM 
presented in the previous section yields the proposed SALSA 
(split augmented Lagrangian shrinkage algorithm). 

Algorithm SALSA 

1. Set k = 0, choose \i > 0, v , and d . 

2. repeat 



3. x' fe = v fe + d fc 

4. x k+1 = argmin x ||Ax- y\\% +/z||x- x'J 2 , 
5- v' fe = x fe+1 - d fc 

6. v fe+ i = argmin v T(/)(v) + f||v- v' k \\% 

7. d k+ i = d fe + x fe+ i - v k+ i 

8. k^k + 1 

9. until stopping criterion is satisfied. 

Notice that SALSA is an instance of ADMM with G = I; 
thus, the full column rank condition on G in Theorem 1 is 
satisfied. If the minimizations in lines 4 and 6 are solved 
exactly, we can then invoke Theorem 1 to guarantee to 
convergence of SALSA. 

In line 4 of SALSA, a strictly convex quadratic function has 
to be minimized; which leads to the following linear system 

x fe+1 = (A H A + ii I) _1 (A H y + f i x' fe ) . (24) 

As shown in the next subsection, this linear system can be 
solved exactly (naturally, up to numerical precision), i.e., non- 
iteratively, for a comprehensive set of situations of interest. 
The matrix A H A + fj,I can be seen as a regularized (by 
the addition of /xl) version of the Hessian of /i(x) = 
(1/2)|| Ax — yll 2 ,, thus SALSA does use second order in- 
formation of this function. Notice also that (24) is formally 
similar to the maximum a posteriori (MAP) estimate of x, 
from observations y = Ax + n (where n is white Gaussian 
noise of variance l//u) under a Gaussian prior of mean x' k and 
covariance I. 

The problem in line 6 is, by definition, the Moreau proximal 
mapping of cj) applied to v' k , thus its solution can be written 

as 

v fc+1 - * T0/M K). (25) 

If this mapping can be computed exactly in closed form, for 
example, if <fi(x) = ||x||i thus is simply a soft threshold, 
then, by Theorem 1, SALSA is guaranteed to converge. If 
\& does not have a closed form solution and requires itself 
an iterative algorithm (e.g., if <j> is the TV norm), then 
convergence of SALSA still holds if one can guarantee that 
the error sequence v k (see Theorem 1) is summable. This can 
be achieved (at least approximately) if the iterative algorithm 
used to approximate \& is initialized with the result of the 
previous outer iteration, and a decreasing stopping threshold 
is used. 

C. Computing x k +\ 

As stated above, we are interested in problems where it is 
not feasible to explicitly form matrix A; this might suggest 
that it is not easy, or even feasible, to compute the inverse in 
(24). However, as shown next, in a number of problems of 
interest, this inverse can be computed very efficiently. 

1) Deconvolution with Analysis Prior: In this case we 
have A = B (see (1), (2), and (5)), where B is the matrix 
representation of a convolution. This is the simplest case, since 
the inverse (B ff B can be computed in the Fourier 

domain. Although this is an elementary and well-known fact, 
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we include the derivation for the sake of completeness. Assum- 
ing that the convolution is periodic (other boundary conditions 
can be addressed with minor changes), B is a block-circulant 
matrix with circulant blocks which can be factorized as 

B = U ff DU, (26) 

where U is the matrix that represents the 2D discrete Fourier 
transform (DFT), XJ H = U _1 is its inverse (U is unitary, i.e., 
UXJ H = U ff U = I), and D is a diagonal matrix containing 
the DFT coefficients of the convolution operator represented 
by B. Thus, 

(B ff B + /il) _1 = (U H D*DU + /iU if U) _1 (27) 

= U H (lUf+fiiy'v, (28) 

where (•)* denotes complex conjugate and |D| 2 the squared 
absolute values of the entries of the diagonal matrix D. Since 
|D| 2 is diagonal, its inversion has linear cost 0(n). The 
products by U and XJ H can be carried out with 0(n log n) cost 
using the FFT algorithm. The expression in (28) is a Wiener 
filter in the frequency domain. 

2) Deconvolution with Frame-Based Synthesis Prior: 
In this case, we have a problem of the form (1), i.e., 
A = BW, thus the inversion that needs to be performed is 

(W ff B ff BW + /i i) . Assuming that B represents a (pe- 
riodic) convolution, this inversion may be sidestepped under 
the assumption that matrix W corresponds to a normalized 
tight frame (a Parseval frame), i.e., WW H = I. Applying 
the Sherman-Morrison-Woodbury (SMW) matrix inversion 
formula yields 

(w ff B fl BW + ,tl) 1 = - (I-W H B H (BB ff + M l) BW). 

F 

Let's focus on the term F = B H (BB H + fil)^ 1 B; using 
the factorization (26), we have 

F = U ff D*(|D| 2 +/il)~ 1 DU. (29) 

Since all the matrices in D* (|D| 2 + lit) 1 D are diagonal, 
this expression can be computed with 0(n) cost, while the 
products by U and \J H can be computed with 0(n log n) cost 
using the FFT. Consequently, products by matrix F (defined 
in (29)) have O(nlogn) cost. 

Defining r k = (A H y + /xx' fc ) = (W H B H y + fi x' fe ), 
allows writing (24) compactly as 

x fe+1 = - (r fc - W T FW r fe ) . (30) 
H 

Notice that multiplication by F corresponds to applying an 
image filter in the Fourier domain. Finally, notice also that 
the term B H W H y can be precomputed, as it doesn't change 
during the algorithm. 

The leading cost of each application of (30) will be either 
0(n log n) or the cost of the products by W H and W. For 
most tight frames used in image processing, these products 
correspond to direct and inverse transforms for which fast al- 
gorithms exist. For example, when W H and W are the inverse 
and direct translation-invariant wavelet transforms, these prod- 
ucts can be computed using the undecimated wavelet transform 



with O(nlogn) total cost [32]. Curvelets also constitute a 
Parseval frame for which fast 0(n log n) implementations 
of the forward and inverse transform exist [7]. Yet another 
example of a redundant Parseval frame is the complex wavelet 
transform, which has 0(n) computational cost [31], [41]. In 
conclusion, for a large class of choices of W, each iteration 
of the SALSA algorithm has 0(n\ogn) cost. 

3) Missing Pixels: Image Inpainting: In the analysis prior 
case (TV-based), we have A = B, where the observation 
matrix B models the loss of some image pixels. Matrix A = B 
is thus an m x n binary matrix, with m < n, which can be 
obtained by taking a subset of rows of an identity matrix. Due 
to its particular structure, this matrix satisfies BB T — I. Using 
this fact together with the SMW formula leads to 

(B T B + /il) _1 = - (i--J—B t b). (31) 

Since B T B is equal to an identity matrix with some zeros 
in the diagonal (corresponding to the positions of the missing 
observations), the matrix in (31) is diagonal with elements ei- 
ther equal to l/(/i+ 1) or Consequently, (24) corresponds 
simply to multiplying (B H y + ^x' fe ) by this diagonal matrix, 
which is an 0(n) operation. 

In the synthesis prior case, we have A = BW, where 
B is the binary sub-sampling matrix defined in the previous 
paragraph. Using the SMW formula yet again, and the fact 
that BB T = I, we have 

fwVBW + ^ill 1 =-I--^-WVbW. (32) 
V / n 1 + fi 

As noted in the previous paragraph, B T B is equal to an 
identity matrix with zeros in the diagonal (corresponding to 
the positions of the missing observations), i.e., it is a binary 
mask. Thus, the multiplication by W fl B ff BW corresponds 
to synthesizing the image, multiplying it by this mask, and 
computing the representation coefficients of the result. In 
conclusion, the cost of (24) is again that of the products by 
W and W H , usually 0{n log n). 

4) Partial Fourier Observations: MRI Reconstruction.: The 
final case considered is that of partial Fourier observations, 
which is used to model magnetic resonance image (MRI) 
acquisition [33], and has been the focus of much recent interest 
due to its connection to compressed sensing [8], [9], [16]. In 
the TV-regularized case, the observation matrix has the form 
A = BU, where B is an m x n binary matrix, with m < n, 
similar to the one in the missing pixels case (it is formed by 
a subset of rows of an identity matrix), and U is the DFT 
matrix. This case is similar to (32), with U and XJ H instead 
of W and W fl , respectively. The cost of (24) is again that 
of the products by U and U H , i.e., 0(n\ogn) if we use the 
FFT. 

In the synthesis case, the observation matrix has the form 
A = BUW. Clearly, the case is again similar to (32), but 
with UW and W H XJ H instead of W and W ff , respectively. 
Again, the cost of (24) is 0(n log n), if the FFT is used to 
compute the products by U and XJ H and fast frame transforms 
are used for the products by W and W H . 
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IV. Experiments 

In this section, we report results of experiments aimed 
at comparing the speed of SALSA with that of the current 
state of the art methods (all of which are freely available 
online): TwIST 2 [5], SpaRSA 3 [45], and FISTA 4 [4]. We 
consider three standard and often studied imaging inverse 
problems: image deconvolution (using both wavelet and TV- 
based regularization); image restoration from missing samples 
(inpainting); image reconstruction from partial Fourier obser- 
vations, which (as mentioned above) has been the focus of 
much recent interest due to its connection with compressed 
sensing and the fact that it models MRI acquisition [33]. All 
experiments were performed using MATLAB for Windows 
XP, on a desktop computer equipped with an Intel Pentium-IV 
3.0 GHz processor and 1.5GB of RAM. To compare the speed 
of the algorithms, in a way that is as independent as possible 
from the different stopping criteria, we first run SALSA and 
then the other algorithms until they reach the same value of the 
objective function. The value of fi for fastest convergence was 
found to differ (though not very much) in each case, but a good 
rule of thumb, adopted in all the experiments, is /i = O.It. 

TABLE I 

Details of the image deconvolution experiments. 



Objective function 0-5lly-Axlc+" Ibc] 



Experiment 


blur kernel 


a 2 


1 


9x9 uniform 


0.56 2 


2A 


Gaussian 


2 


2B 


Gaussian 


8 


3A 


h i:j =l/(l+i 2 +i 2 ) 


2 


3B 


hij = l/(l + i 2 +j 2 ) 


8 



A. Image Deblurring with wavelets 

We consider five benchmark deblurring problems [22], 
summarized in Table I, all on the well-known Cameraman 
image. The regularizer is <f>(f3) = ||/3||i, thus ^ T< j, is an 
element-wise soft threshold. The blur operator B is applied 
via the FFT The regularization parameter r is hand tuned in 
each case for best improvement in SNR, so that the comparison 
is carried out in the regime that is relevant in practice. Since 
the restored images are visually indistinguishable from those 
obtained in [22], and the SNR improvements are also very 
similar, we simply report computation times. 

In the first set of experiments, W is a redundant Haar 
wavelet frame with four levels. The CPU times taken by each 
of the algorithms are presented in Table II. In the second 
set of experiments, W is an orthogonal Haar wavelet basis; 
the results are reported in Table III. To visually illustrate the 
relative speed of the algorithms, Figures 1 and 2 plot the 
evolution of the objective function (see Eq. (1)), versus time, 
in experiments 1, 2B, and 3 A, for redundant and orthogonal 
wavelets, respectively. 

-Available at http://www.lx.it.pt/-bioucas/code/TwIST_ 
vl . zip 

3 Available at http : / /www . lx . it . pt/ ~mtf / SpaRSA/ 
4 Available at http://iew3.technion.ac.il/~becka/papers/ 
wavelet_FISTA. zip 



-twist 

FISTA 
- 'SpaRSA 
-SALSA 



(a) 




(b) 



Objective function O.SIIy-Axir+Mlxl 



-TwIST 
■ ■ FISTA 
- SpaRSA 
-SALSA 



(c) 

Fig. 1. Objective function evolution (redundant wavelets): (a) experiment 
1A; (b) experiment 2B; (c) experiment 3A. 



B. Image Deblurring with Total Variation 

The same five image deconvolution problems listed in Ta- 
ble I were also addressed using total variation (TV) regulariza- 
tion (more specifically, the isotropic discrete total variation, as 
defined in [10]). The corresponding Moreau proximal mapping 
is computed using 5 iterations of Chambolle's algorithm [10]. 

The CPU times taken by SALSA, TwIST, SpaRSA, and 
FISTA are listed in Table IV. The evolutions of the objective 
functions (for experiments 1, 2B, and 3A) are plotted in 
Figure 3. 

We can conclude from Tables II, III, and IV that, in image 
deconvolution problems, both with wavelet-based and TV- 
based regularization, SALSA is always clearly faster than the 
fastest of the other competing algorithms. 
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TABLE II 

Image deblurring with redundant wavelets: CPU times (in 

SECONDS). 



TABLE III 

Image deblurring with orthogonal wavelets: CPU times (in 

SECONDS). 



Experiment 


TwIST 


SpARSA 


FISTA 


SALSA 


1 


38.5781 


53.4844 


98.2344 


2.26563 


2A 


33.8125 


42.7656 


65.3281 


4.60938 


2B 


35.2031 


70.7031 


112.109 


12.0313 


3A 


20.4688 


13.3594 


32.2969 


2.67188 


3B 


9.0625 


5.8125 


18.0469 


2.07813 


io\ 


Objective 


function 0.5lly-Axll^+ 


Until, 





-TwIST 
■ ■ FISTA 
-■SpaRSA 
- SALSA 



Experiment 


TwIST 


SpARSA 


FISTA 


SALSA 


1 


16.5156 


39.6094 


16.8281 


2.23438 


2A 


10.1406 


16.3438 


15.9531 


1.375 


2B 


5.10938 


7.96875 


5.3125 


0.640625 


3A 


3.67188 


5.23438 


7.46875 


1.03125 


3B 


2.57813 


2.64063 


3.625 


0.5625 



Objective function G-Slly-Axlf+a, * TV W 



— TwIST 

- - FISTA 
- SpaRSA 

- -SALSA 



(a) 

Objective function O.SIIy-Axlf+Mlxll 



-TwIST 
■ ■ FISTA 
- SpaRSA 
-SALSA 



(a) 



Objective function 0-5lly— AxlC+X * 00 



-TwIST 
■ ■ FISTA 
-SpaRSA 
-SALSA 




(b) 

Objective function 0.5lly-Axl6+jUbtI] 



- TwIST 
FISTA 

- SpaRSA 

- SALSA 



(b) 

tion 0.5lly-Axli;+X. -J» (x) 




-TwIST 
■ ' FISTA 

- SpaRSA 

- SALSA 



(O 



(c) 



Fig. 2. Objective function evolution (orthogonal wavelets): (a) experiment Fi §- 3 - Ima § e deblurring with TV regularization - Objective function 

evolution: (a) 9 X 9 uniform blur, a = 0.56; (b) Gaussian blur, a = 8; 



1A; (b) experiment 2B; (c) experiment 3A. 



(c) hij = 1/(1 + i 2 + j 2 ) blur, a 2 = 2. 



C. MRI Image Reconstruction 

We consider the problem of reconstructing the 128 x 128 
Shepp-Logan phantom (shown in Figure 4(a)) from a limited 
number of radial lines (22, in our experiments, as shown in 
Figure 4(b)) of its 2D discrete Fourier transform. The projec- 
tions are also corrupted with circular complex Gaussian noise, 



with variance a 1 — 0.5 x 10~ 3 . We use TV regularization 
(as described in Subsection IV-B), with the corresponding 
Moreau proximal mapping implemented by 40 iterations of 
Chambolle's algorithm [10]. 

Table V shows the CPU times, while Figure 5 plots the 
evolution of the objective function over time. Figure 4(c) 
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TV-Based Image deblurring: CPU Times (in seconds). 



Experiment 


TwIST 


SpARSA 


FISTA 


SALSA 


1 


63.2344 


80.0469 


346.734 


11.2813 


2A 


19.1563 


24.1094 


34.1406 


4.79688 


2B 


10.9375 


7.75 


29.0156 


2.46875 


3A 


13.9688 


12.4375 


35.2969 


2.79688 


3B 


10.9531 


7.75 


28.3438 


2.78125 



shows the estimate obtained using SALSA (the others are, 
naturally, visually indistinguishable). Again, we may conclude 
that SALSA is considerably faster than the other three algo- 
rithms, while achieving comparable values of mean squared 
error of the reconstructed image. 

TABLE V 

MRI RECONSTRUCTION: COMPARISON OF THE VARIOUS ALGORITHMS. 





TwIST 


SpARSA 


FISTA 


SALSA 


Iterations 
CPU time (seconds) 
MSE 


1002 

529.297 

4.384e-7 


1001 

328.688 

6.033e-5 


1000 

390.75 

4.644e-7 


53 

76.5781 
5.817e-7 



D. Image Inpainting 

Finally, we consider an image inpainting problem, as ex- 
plained in Section III-C. The original image is again the 
Cameraman, and the observation consists in loosing 40% of 
its pixels, as shown in Figure 6. The observations are also 
corrupted with Gaussian noise (with an SNR of 40 dB). 
The regularizer is again TV implemented by 20 iterations of 
Chambolle's algorithm. 

The image estimate obtained by SALSA is shown in 
Figure 6, with the original also shown for comparison. The 
estimates obtained using TwIST and FISTA were visually very 
similar. Table VI compares the performance of SALSA with 
that of TwIST and FISTA and Figure 7 shows the evolution 
of the objective function for each of the algorithms. Again, 
SALSA is considerably faster than the alternative algorithms. 

TABLE VI 

Image inpainting: Comparison of the various algorithms. 





TwIST 


FISTA 


SALSA 


Iterations 


302 


300 


33 


CPU time (seconds) 


305 


228 


23 


MSE 


105 


101 


99.1 


ISNR (dB) 


18.4 


18.5 


18.6 



V. Conclusions 

We have presented a new algorithm for solving the un- 
constrained optimization formulation of regularized image 
reconstruction/restoration. The approach, which can be used 
with different types of regularization (wavelet-based, total 
variation), is based on a variable splitting technique which 
yields an equivalent constrained problem. This constrained 
problem is then addressed using an augmented Lagrangian 
method, more specifically, the alternating direction method of 
multipliers (ADMM). The algorithm uses a regularized version 
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(c) 

Fig. 4. MRI reconstruction: (a) 128 X 128 Shepp Logan phantom; (b) Mask 
with 22 radial lines; (c) image estimated using SALSA. 

of the Hessian of the £ 2 data-fidelity term, which can be com- 
puted efficiently for several classes of problems. Experiments 
on a set of standard image recovery problems (deconvolution, 
MRI reconstruction, inpainting) have shown that the proposed 
algorithm (termed SALSA, for split augmented Lagrangian 
shrinkage algorithm) is faster than previous state-of-the-art 
methods. Current and future work involves using a similar 
approach to solve constrained formulations of the forms (3) 
and (4). 
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